[Relax][TOPI] Add relax.vision.multibox_transform_loc for SSD/TFLite box decode#18942
[Relax][TOPI] Add relax.vision.multibox_transform_loc for SSD/TFLite box decode#18942tlopex merged 4 commits intoapache:mainfrom
Conversation
…box decode Introduce relax.vision.multibox_transform_loc with MultiboxTransformLocAttrs: decode center-size offsets against ltrb priors, softmax on class logits, and optional clip, threshold masking, and background score zeroing. Register the C++ op with FInferStructInfo checks for shapes and dtypes (including batch and 4*N consistency). Legalize to topi.vision.multibox_transform_loc. Add tests for struct inference, invalid inputs, Legalize+e2e on LLVM, attribute branches, and TVMScript roundtrip. Add a standalone numpy reference under topi/testing (not exported from tvm.topi.testing to avoid pulling scipy). Update TFLite frontend NotImplementedError text for DETECTION_POSTPROCESS and NON_MAX_SUPPRESSION_V5 to note multibox is available and link tracking issue apache#18928.
|
cc @tlopex |
There was a problem hiding this comment.
Code Review
This pull request introduces the multibox_transform_loc operator to TVM Relax, enabling SSD and TFLite-style box decoding. The implementation includes the Relax operator definition, C++ attributes, struct info inference, and a TOPI implementation using TE. Additionally, a NumPy reference implementation and comprehensive unit tests for correctness and TVMScript parsing are provided. Review feedback suggests simplifying nested conditionals in the TOPI implementation for better readability and removing redundant type casts in the reference implementation.
tlopex
left a comment
There was a problem hiding this comment.
Some issues need to be improved
- There are still a couple of lines over the 100-column limit in
multibox_transform_loc.cc. Also, the UTF-8→in the argument description should probably be replaced with->orto. topi/testing/multibox_transform_loc_python.pylooks unused right now — the tests already carry an identical inline copy. Either reuse the helper fromtopi/testingor remove the dead file.test_multibox_transform_loc_wrong_shape_relationdoes not currently hit the cross-shape validation, sinceloc_pred.shape[1] == 19already fails the divisibility check. Please add a case like24to cover the intended path.- The LLVM E2E tests should be marked with
@tvm.testing.requires_llvmso they skip cleanly on shards without LLVM. - All current E2E tests use variances = (1.0, 1.0, 1.0, 1.0), so the variance scaling logic is effectively a no-op in test coverage. It would be good to add a case with non-unity variances (e.g. (0.1, 0.1, 0.2, 0.2)) to exercise that path end-to-end.
|
cc @tlopex |
- multibox_transform_loc.cc: wrap long lines, ASCII loc_pred doc (no Unicode arrow). - vision.h: wrap MultiboxTransformLoc clip reflection line to 100 cols. - Remove unused topi/testing/multibox_transform_loc_python.py; keep numpy ref in tests. - test_op_vision: loc_dim=24 vs 4*N infer case; @tvm.testing.requires_llvm on LLVM e2e (multibox + all_class_nms); add e2e with non-unity variances.
tlopex
left a comment
There was a problem hiding this comment.
Overall LGTM,
A few minor follow-ups:
- Please consider documenting that shape cross-checking is partial when
cls_shapeis unknown. - Please consider adding a short note on the expected
variancesrange, since very large values could overflowte.exp(...).
|
@tlopex Thanks for the follow-ups — addressed in the latest commit. We documented that when cls_pred has unknown shape, FInferStructInfo only returns generic rank-3 outputs and skips N-based cross-checks (e.g. loc_pred.shape[1] == 4*N and anchor.shape[1] == N) in a Doxygen note on InferStructInfoMultiboxTransformLoc, the op .describe(...) string, and the Python op docstring Notes. We also added a short warning that very large w/h variances can overflow exp in the half-height/width decode path, in MultiboxTransformLocAttrs reflection text, the Python variances parameter docs, and the same op description. |
|
Thank you! LGTM |
Introduce relax.vision.multibox_transform_loc with MultiboxTransformLocAttrs: decode center-size offsets against ltrb priors, softmax on class logits, and optional clip, threshold masking, and background score zeroing. Register the C++ op with FInferStructInfo checks for shapes and dtypes (including batch and 4*N consistency). Legalize to topi.vision.multibox_transform_loc.
Add tests for struct inference, invalid inputs, Legalize+e2e on LLVM, attribute branches, and TVMScript roundtrip. Add a standalone numpy reference under topi/testing (not exported from tvm.topi.testing to avoid pulling scipy).
Update TFLite frontend NotImplementedError text for DETECTION_POSTPROCESS and NON_MAX_SUPPRESSION_V5 to note multibox is available and link tracking issue #18928.